AITopics | batch normalization bias residual block

Collaborating Authors

batch normalization bias residual block

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Neural Information Processing SystemsDec-24-2025, 19:53:36 GMT

Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks. We show that this key benefit arises because, at initialization, batch normalization downscales the residual branch relative to the skip connection, by a normalizing factor on the order of the square root of the network depth. This ensures that, early in training, the function computed by normalized residual blocks in deep networks is close to the identity function (on average). We use this insight to develop a simple initialization scheme that can train deep residual networks without normalization. We also provide a detailed empirical study of residual networks, which clarifies that, although batch normalized networks can be trained with larger learning rates, this effect is only beneficial in specific compute regimes, and has minimal benefits when the batch size is small.

batch normalization bias residual block, identity function, residual network, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Neural Information Processing SystemsFeb-7-2025, 14:23:09 GMT

Weaknesses: * There might be multiple reasons make networks BN trainable under extreme conditions, including large learning rate and huge depth. I agree the point made by this work, that small init in residual branches is such a reason, which in turn makes vanilla resnet withour normalization trainble, however It's possible that the normalized resnet are trainable even without small init in residual branches. It's well known that the input/output scale for the weights before batch normalization is not making as much sense as they do for networks without normalization. For example, Li&Arora, 2019 shows that slightly modified ResNet is trainable with exponential increasing LR and achieves equally good performance as Step Decay schedule. The output of the residual blocks could also grow exponentially, but the network is still trainable because the gradients are small.

batch normalization bias residual block, identity function, residual branch, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Neural Information Processing SystemsOct-11-2024, 16:12:42 GMT

batch normalization bias residual block, identity function, residual network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback